Skip to content

Simplify/unify XMPP element metrics #4520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 22, 2025
Merged

Conversation

chrzaszcz
Copy link
Member

@chrzaszcz chrzaszcz commented Apr 11, 2025

The main goal is to simplify and unify events for XMPP (XML) elements. The main benefit is that unified metrics become easier to understand, maintain, test and document.

Before, there would be pairs of events like c2s_element_in and xmpp_element_size_in. After, there is only one event: xmpp_element_in. The complete list is shown below:

  1. c2s_element_in (labels: host_type) and xmpp_element_size_in (labels: connection_type=c2s) are merged to xmpp_element_in (labels: host_type, connection_type=c2s).
  2. c2s_element_out (labels: host_type) and xmpp_element_size_out (labels: connection_type=c2s) are merged to xmpp_element_out (labels: host_type, connection_type=c2s).
  3. s2s_element_in and xmpp_element_size_in (labels: connection_type=s2s) are merged to xmpp_element_in (labels: host_type, connection_type=s2s).
  4. s2s_element_out and xmpp_element_size_out (labels: connection_type=s2s) are merged to xmpp_element_out (labels: host_type, connection_type=s2s).
  5. component_element_in and xmpp_element_size_in (labels: connection_type=component) are merged to xmpp_element_in (labels: host_type="", connection_type=component).
  6. component_element_out and xmpp_element_size_out (labels: connection_type=component) are merged to xmpp_element_out (labels: host_type="", connection_type=component).

Notes:

  • Ad 1, 2: c2s_element_* were missing if host type was unknown (e.g. in case of early stream errors). Now it is no longer the case.
  • Size-related events and all s2s/component events didn't include host types. Now they do if host type is known.
  • If host type is unknown, it is set to an empty binary in the labels. Prometheus sees this as an empty label value, and allows to e.g. display graphs of all events without host types (useful for debugging; I tested this manually). For exometer, we just put global as the prefix in such cases.
  • Docs are updated to describe the new metrics. Some mistakes in the docs are corrected as well.
  • Tests are updated to check the new metrics. Some mistakes in the tests are corrected as well.

@mongoose-im

This comment was marked as outdated.

Copy link

codecov bot commented Apr 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.88%. Comparing base (e89f360) to head (6134f1d).
Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4520      +/-   ##
==========================================
+ Coverage   85.86%   85.88%   +0.01%     
==========================================
  Files         565      565              
  Lines       33803    33795       -8     
==========================================
- Hits        29025    29024       -1     
+ Misses       4778     4771       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from 9923835 to 84d7d06 Compare April 11, 2025 12:19
@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from 84d7d06 to 9d1a993 Compare April 11, 2025 16:02
@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from 9d1a993 to 4540c4f Compare April 11, 2025 16:19
@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from 4540c4f to f39f613 Compare April 14, 2025 14:19
@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from f39f613 to 262ddac Compare April 16, 2025 07:29
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

The new module has two functions:
  - execute_element_event/5 for incoming/outgoing XML elements
  - instrumentation/1 for declaration of events for elements and data

Element events are simplified to xmpp_element_in/out with the following labels:
  - connection_type: c2s, s2s or component
  - host_type: it can be an empty binary if no host type is known,
    e.g. for components, or for stream start errors.
    * for exometer, empty host type is translated to 'global';
    * prometheus treats empty string as no label, so no translation is needed.
Host type is always undefined for components.
It is now easier to collect all XMPP metrics (not only c2s),
and the total traffic is a better estimation of system load,
e.g. someone might only use components, and no c2s.
Also, refactor helpers to be more concise.
There is need to check for the empty host type in the labels,
because components don't have host types assigned to them.
Replace 'global' with host type when necessary.
This is needed, because all XML elements are reported per host type
if the host type is known.
@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from 7da5aab to a631cef Compare April 17, 2025 13:05
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from 12263f0 to e42b247 Compare April 22, 2025 12:53
@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from e42b247 to ffdaf05 Compare April 22, 2025 13:13
Actually, only one event was tested, because the 3-element tuples were
filtered out.

xx
@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from ffdaf05 to 6a9bcf6 Compare April 22, 2025 13:21
@mongoose-im

This comment was marked as outdated.

Affected metrics:
- xmpp_element_size_in and xmpp_element_out, which replaced
  the 'element' and 'size' metrics.
- c2s_auth_failed, s2s_auth_failed, component_auth_failed

Other changes in the docs:
- Dropped distintion between host-type-specific and global metrics,
  because it was difficult to group metrics by functionality,
  and most metrics were decribed with labels anyway.
- Merged descriptions of data metrics (tcp_data_in, tcp_data_out,
  tls_data_in, tls_data_out) for improved consistency and reduced verbosity.
- Added a new column with labels where labels differ between rows.
@chrzaszcz chrzaszcz force-pushed the simplify-element-metrics branch from c260d7d to 6134f1d Compare April 22, 2025 13:30
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Apr 22, 2025

elasticsearch_and_cassandra_27 / elasticsearch_and_cassandra_mnesia / 6134f1d
Reports root/ big
OK: 683 / Failed: 0 / User-skipped: 72 / Auto-skipped: 0


small_tests_26 / small_tests / 6134f1d
Reports root / small


small_tests_27 / small_tests / 6134f1d
Reports root / small


small_tests_27_arm64 / small_tests / 6134f1d
Reports root / small


ldap_mnesia_26 / ldap_mnesia / 6134f1d
Reports root/ big
OK: 2332 / Failed: 0 / User-skipped: 1187 / Auto-skipped: 0


ldap_mnesia_27 / ldap_mnesia / 6134f1d
Reports root/ big
OK: 2332 / Failed: 0 / User-skipped: 1352 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_27 / pgsql_mnesia / 6134f1d
Reports root/ big
OK: 5195 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


dynamic_domains_mysql_redis_27 / mysql_redis / 6134f1d
Reports root/ big
OK: 5160 / Failed: 0 / User-skipped: 154 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 6134f1d
Reports root/ big
OK: 5030 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_27 / odbc_mssql_mnesia / 6134f1d
Reports root/ big
OK: 5190 / Failed: 0 / User-skipped: 124 / Auto-skipped: 0


pgsql_cets_27 / pgsql_cets / 6134f1d
Reports root/ big
OK: 5281 / Failed: 0 / User-skipped: 188 / Auto-skipped: 0


mysql_redis_27 / mysql_redis / 6134f1d
Reports root/ big
OK: 5563 / Failed: 0 / User-skipped: 149 / Auto-skipped: 0


pgsql_mnesia_27 / pgsql_mnesia / 6134f1d
Reports root/ big
OK: 5584 / Failed: 0 / User-skipped: 128 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / 6134f1d
Reports root/ big
OK: 5419 / Failed: 0 / User-skipped: 128 / Auto-skipped: 0


cockroachdb_cets_27 / cockroachdb_cets / 6134f1d
Reports root/ big
OK: 5287 / Failed: 3 / User-skipped: 188 / Auto-skipped: 0

pubsub_SUITE:dag+last_item_cache:send_last_published_item_no_items_test
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,
          <<"alice_send_last_published_item_no_items_test_3749@localhost/res1">>,
          escalus_tcp,<0.118002.0>,
          [{event_manager,<0.117983.0>},
           {server,<<"localhost">>},
           {username,
             <<"alicE_send_last_published_item_no_items_test_3749">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.117983.0>},
            {server,<<"localhost">>},
            {username,
              <<"alicE_send_last_published_item_no_items_test_3749">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,
             <<"alice_send_last_published_item_no_items_test_3749">>},
           {server,<<"localhost">>},
           {host,<<"localhost">>},
           {port,5222},
           {auth,fun escalus_auth:auth_plain/2},
           {wspath,undefined},
           {username,
             <<"alicE_send_last_published_item_no_items_test_3749">>},
           {server,<<"localhost">>},
           {password,<<"matygrysa">>},
           {stream_id,<<"901c37e7dcc3699b">>}]},
        5000],
       [{file,
          "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,136}]},
     {pubsub_tools,receive_response,3,
       [{file,"/home/circleci/project/big_tests/tests/pubsub_tools.erl"},
        {line,434}]},
     {pubsub_tools,receive_and_c...

Report log

pubsub_SUITE:dag+last_item_cache:send_last_published_item_no_items_test
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,
          <<"alice_send_last_published_item_no_items_test_3754@localhost/res1">>,
          escalus_tcp,<0.118081.0>,
          [{event_manager,<0.118069.0>},
           {server,<<"localhost">>},
           {username,
             <<"alicE_send_last_published_item_no_items_test_3754">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.118069.0>},
            {server,<<"localhost">>},
            {username,
              <<"alicE_send_last_published_item_no_items_test_3754">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,
             <<"alice_send_last_published_item_no_items_test_3754">>},
           {server,<<"localhost">>},
           {host,<<"localhost">>},
           {port,5222},
           {auth,fun escalus_auth:auth_plain/2},
           {wspath,undefined},
           {username,
             <<"alicE_send_last_published_item_no_items_test_3754">>},
           {server,<<"localhost">>},
           {password,<<"matygrysa">>},
           {stream_id,<<"dfee566c7a916ba4">>}]},
        5000],
       [{file,
          "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,136}]},
     {pubsub_tools,receive_response,3,
       [{file,"/home/circleci/project/big_tests/tests/pubsub_tools.erl"},
        {line,434}]},
     {pubsub_tools,receive_and_c...

Report log

pubsub_SUITE:dag+last_item_cache:send_last_published_item_no_items_test
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,
          <<"alice_send_last_published_item_no_items_test_3756@localhost/res1">>,
          escalus_tcp,<0.118158.0>,
          [{event_manager,<0.118146.0>},
           {server,<<"localhost">>},
           {username,
             <<"alicE_send_last_published_item_no_items_test_3756">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.118146.0>},
            {server,<<"localhost">>},
            {username,
              <<"alicE_send_last_published_item_no_items_test_3756">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,
             <<"alice_send_last_published_item_no_items_test_3756">>},
           {server,<<"localhost">>},
           {host,<<"localhost">>},
           {port,5222},
           {auth,fun escalus_auth:auth_plain/2},
           {wspath,undefined},
           {username,
             <<"alicE_send_last_published_item_no_items_test_3756">>},
           {server,<<"localhost">>},
           {password,<<"matygrysa">>},
           {stream_id,<<"137e3369f3f6b13a">>}]},
        5000],
       [{file,
          "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,136}]},
     {pubsub_tools,receive_response,3,
       [{file,"/home/circleci/project/big_tests/tests/pubsub_tools.erl"},
        {line,434}]},
     {pubsub_tools,receive_and_c...

Report log


mssql_mnesia_27 / odbc_mssql_mnesia / 6134f1d
Reports root/ big
OK: 5579 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0

@chrzaszcz chrzaszcz changed the title Simplify XMPP element metrics Simplify/unify XMPP element metrics Apr 22, 2025
@chrzaszcz chrzaszcz marked this pull request as ready for review April 22, 2025 14:07
Copy link
Collaborator

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely amazing, this is what I dreamed this should look like! 🎉

@NelsonVides NelsonVides merged commit 321006a into master Apr 22, 2025
4 checks passed
@NelsonVides NelsonVides deleted the simplify-element-metrics branch April 22, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants